skip to main content


Search for: All records

Creators/Authors contains: "Li, Wentao"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Nikolski, Macha (Ed.)
    Abstract Motivation

    Genome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques provide formal secure guarantees, the substantial communication and computational overheads hinder the practical application of large-scale collaborative GWAS.

    Results

    This work introduces an efficient framework for conducting collaborative GWAS on distributed datasets, maintaining data privacy without compromising the accuracy of the results. We propose a novel two-step strategy aimed at reducing communication and computational overheads, and we employ iterative and sampling techniques to ensure accurate results. We instantiate our approach using logistic regression, a commonly used statistical method for identifying associations between genetic markers and the phenotype of interest. We evaluate our proposed methods using two real genomic datasets and demonstrate their robustness in the presence of between-study heterogeneity and skewed phenotype distributions using a variety of experimental settings. The empirical results show the efficiency and applicability of the proposed method and the promise for its application for large-scale collaborative GWAS.

    Availability and implementation

    The source code and data are available at https://github.com/amioamo/TDS.

     
    more » « less
    Free, publicly-accessible full text available October 1, 2024
  2. Approximate confidence distribution computing (ACDC) offers a new take on the rapidly developing field of likelihood-free inference from within a frequentist framework. The appeal of this computational method for statistical inference hinges upon the concept of a confidence distribution, a special type of estimator which is defined with respect to the repeated sampling principle. An ACDC method provides frequentist validation for computational inference in problems with unknown or intractable likelihoods. The main theoretical contribution of this work is the identification of a matching condition necessary for frequentist validity of inference from this method. In addition to providing an example of how a modern understanding of confidence distribution theory can be used to connect Bayesian and frequentist inferential paradigms, we present a case to expand the current scope of so-called approximate Bayesian inference to include non-Bayesian inference by targeting a confidence distribution rather than a posterior. The main practical contribution of this work is the development of a data-driven approach to drive ACDC in both Bayesian or frequentist contexts. The ACDC algorithm is data-driven by the selection of a data-dependent proposal function, the structure of which is quite general and adaptable to many settings. We explore three numerical examples that both verify the theoretical arguments in the development of ACDC and suggest instances in which ACDC outperform approximate Bayesian computing methods computationally.

     
    more » « less
  3. Abstract Background

    Estimation of genetic relatedness, or kinship, is used occasionally for recreational purposes and in forensic applications. While numerous methods were developed to estimate kinship, they suffer from high computational requirements and often make an untenable assumption of homogeneous population ancestry of the samples. Moreover, genetic privacy is generally overlooked in the usage of kinship estimation methods. There can be ethical concerns about finding unknown familial relationships in third-party databases. Similar ethical concerns may arise while estimating and reporting sensitive population-level statistics such as inbreeding coefficients for the concerns around marginalization and stigmatization.

    Results

    Here, we present SIGFRIED, which makes use of existing reference panels with a projection-based approach that simplifies kinship estimation in the admixed populations. We use simulated and real datasets to demonstrate the accuracy and efficiency of kinship estimation. We present a secure federated kinship estimation framework and implement a secure kinship estimator using homomorphic encryption-based primitives for computing relatedness between samples in two different sites while genotype data are kept confidential. Source code and documentation for our methods can be found at https://doi.org/10.5281/zenodo.7053352.

    Conclusions

    Analysis of relatedness is fundamentally important for identifying relatives, in association studies, and for estimation of population-level estimates of inbreeding. As the awareness of individual and group genomic privacy is growing, privacy-preserving methods for the estimation of relatedness are needed. Presented methods alleviate the ethical and privacy concerns in the analysis of relatedness in admixed, historically isolated and underrepresented populations.

    Short Abstract

    Genetic relatedness is a central quantity used for finding relatives in databases, correcting biases in genome wide association studies and for estimating population-level statistics. Methods for estimating genetic relatedness have high computational requirements, and occasionally do not consider individuals from admixed ancestries. Furthermore, the ethical concerns around using genetic data and calculating relatedness are not considered. We present a projection-based approach that can efficiently and accurately estimate kinship. We implement our method using encryption-based techniques that provide provable security guarantees to protect genetic data while kinship statistics are computed among multiple sites.

     
    more » « less
  4. null (Ed.)
    A rapid and sensitive method is described for measuring perchlorate (ClO 4 − ), chlorate (ClO 3 − ), chlorite (ClO 2 − ), bromate (BrO 3 − ), and iodate (IO 3 − ) ions in natural and treated waters using non-suppressed ion chromatography with electrospray ionization and tandem mass spectrometry (NS-IC-MS/MS). Major benefits of the NS-IC-MS/MS method include a short analysis time (12 minutes), low limits of quantification for BrO 3 − (0.10 μg L −1 ), ClO 4 − (0.06 μg L −1 ), ClO 3 − (0.80 μg L −1 ), and ClO 2 − (0.40 μg L −1 ), and compatibility with conventional LC-MS/MS instrumentation. Chromatographic separations were generally performed under isocratic conditions with a Thermo Scientific Dionex AS16 column, using a mobile phase of 20% 1 M aqueous methylamine and 80% acetonitrile. The isocratic method can also be optimized for IO 3 − analysis by including a gradient from the isocratic mobile phase to 100% 1 M aqueous methylamine. Four common anions (Cl − , Br − , SO 4 2− , and HCO 3 − /CO 3 2− ), a natural organic matter isolate (Suwannee River NOM), and several real water samples were tested to examine influences of natural water constituents on oxyhalide detection. Only ClO 2 − quantification was significantly affected – by elevated chloride concentrations (>2 mM) and NOM. The method was successfully applied to quantify oxyhalides in natural waters, chlorinated tap water, and waters subjected to advanced oxidation by sunlight-driven photolysis of free available chlorine (sunlight/FAC). Sunlight/FAC treatment of NOM-free waters containing 200 μg L −1 Br − resulted in formation of up to 263 ± 35 μg L −1 and 764 ± 54 μg L −1 ClO 3 − , and up to 20.1 ± 1.0 μg L −1 and 33.8 ± 1.0 μg L −1 BrO 3 − (at pH 6 and 8, respectively). NOM strongly inhibited ClO 3 − and BrO 3 − formation, likely by scavenging reactive oxygen or halogen species. As prior work shows that the greatest benefits in applying the sunlight/FAC process for purposes of improving disinfection of chlorine-resistant microorganisms are realized in waters with lower DOC levels and higher pH, it may therefore be desirable to limit potential applications to waters containing moderate DOC concentrations ( e.g. , ∼1–2 mg C L −1 ), low Br − concentrations ( e.g. , <50 μg L −1 ), and circumneutral to moderately alkaline pH ( e.g. , pH 7–8) to strike a balance between maximizing microbial inactivation while minimizing formation of oxyhalides and other disinfection byproducts. 
    more » « less
  5. Abstract

    Despite the excellent physical properties of single‐component Eu3+–Tb3+‐containing metallopolymers, the development of their flexible white polymer light‐emitting diodes (WPLEDs) for portable full‐color flat displays remains a formidable challenge. Herein, the WPLEDs from a metallopolymerPoly(NVK‐co‐2‐co‐7)are reported, in which [Eu(DBM)3(4‐vp‐PBI)] (2) and [Tb(tba‐PMP)3(4‐vp‐PBI)] (7) with different localized circumstances are grafted into poly(N‐vinyl‐carbarzole) (PVK). In this design, both Dexter and Förster energy transfers occur, which endow a photoluminescent quantum yield up to 22.3% of the straightforward high‐quality white‐lights. Contributing from the stepwise alignment of frontier molecular orbitals ofPoly(NVK‐co‐2‐co‐7)as the emitting layer in combination with CBP‐ and BCP‐assisted carrier‐transports, a reliable WPLED with the record‐renewed electroluminescent performance (LMax= 388.0 cd m−2, ηcMax= 31.1 cd A−1, ηpMax= 15.0 lm W−1, ηEQEMax= 18.1%, and weak efficiency‐roll‐off) among previous organo‐Ln3+‐based white organic light‐emitting diodes/WPLEDs is achieved. This finding renders a single‐component Eu3+–Tb3+‐containing metallopolymers a potential new platform to cost‐effective flexible WPLEDs for practical applications.

     
    more » « less